Parallel Dense Cholesky Factorization
نویسندگان
چکیده
This paper explores performance issues for several prominent approaches to parallel dense Cholesky fattotition. The primary focus is on issues that arise when blocking techniques are integrated into parallel factorization approaches to improve data reuse in the memory hierarchy. We f&t consider panel-oriented approaches, where sets of contiguous columns are manipulated as single units. These methods represent natural extensions of the column-oriented methods that have been widely used previously. On machines with mernox-y hierarchies, panel-oriented methods significantly increase the achieved performance over column-oriented methods. However, we find that panel-oriented methods do not expose enough concurrency for problems that one might reasonably expect to solve on moderately parallel machines, thus significantly limiting their @ormance. We then explore block-oriented approaches, where square submatrices are manipulated instead of sets of columns. These methods greatly increase the amount of avail’able concurrency, thus alleviating the problems encountered with panel-oriented methods. However, a number of issues, including scheduling choices and block-placement issues, complicate their implementation. We discuss these issues and consider approaches that solve the resulting problems. The resulG.ng block-oriented implementation yields high processor utilization levels over a wide range of problem sizes. 14. SUSJECT TERMS 15;. NUMBER OF PAGES 28 h i e r a r c h i c a l m e m o r y m a c h i n e s , C h o l e s k y f a c t o r i z a t i o n 16. PRICE CODE 17. SECURITY CLASSIFICATION 18. SECURITV CLASSIFICATION 19. SECURITY CLASSIFICATION 20. LIMITATION OF ABSTRAC‘ OF REPORT OF THIS PAGE OF ABSTRACT unclassified unclassified unclassified JSN 7520-?1-280-5500 Standard Form 298 (Qev 2-89) The Performance Impact of Data Reuse in Parallel Dense Cholesky Factorization Edward Rothberg and Anoop Gupta Department of Computer Science Stanford University Stanford, CA 94305
منابع مشابه
Scalable Parallel Algorithms for Solving Sparse Systems of Linear Equations∗
We have developed a highly parallel sparse Cholesky factorization algorithm that substantially improves the state of the art in parallel direct solution of sparse linear systems—both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. However, it had been a challenge to dev...
متن کاملParallel and Fully Recursive Multifrontal Supernodal Sparse Cholesky
We describe the design, implementation, and performance of a new parallel sparse Cholesky factorization code. The code uses a supernodal multifrontal factorization strategy. Operations on small dense submatrices are performed using new dense-matrix subroutines that are part of the code, although the code can also use the BLAS and LAPACK. The new code is recursive at both the sparse and the dens...
متن کاملImplementing a parallel matrix factorization library on the cell broadband engine
Matrix factorization (or often called decomposition) is a frequently used kernel in a large number of applications ranging from linear solvers to data clustering and machine learning. The central contribution of this paper is a thorough performance study of four popular matrix factorization techniques, namely, LU, Cholesky, QR, and SVD on the STI Cell broadband engine. The paper explores algori...
متن کاملParallel and fully recursive multifrontal sparse Cholesky
We describe the design, implementation, and performance of a new parallel sparse Cholesky factorization code. The code uses a multifrontal factorization strategy. Operations on small dense submatrices are performed using new dense matrix subroutines that are part of the code, although the code can also use the blas and lapack. The new code is recursive at both the sparse and the dense levels, i...
متن کاملExperiments with Cholesky Factorization on Clusters of SMPs
Cholesky factorization of large dense matrices is an integral part of many applications in science and engineering. In this paper we report on experiments with different parallel versions of Cholesky factorization on modern high-performance computing architectures. For the parallelization of Cholesky factorization we utilized various standard linear algebra software packages and present perform...
متن کاملPOOCLAPACK: Parallel Out-of-Core Linear Algebra Package
In this paper parallel implementation of out-of-core Cholesky factorization is used to introduce the Parallel Outof-Core Linear Algebra Package (POOCLAPACK), a flexible infrastructure for parallel implementation of out-of-core linear algebra operations. POOCLAPACK builds on the Parallel Linear Algebra Package (PLAPACK) for in-core parallel dense linear algebra computation. Despite the extreme s...
متن کامل